Context-Specific and Multi-Prototype Character Representations

نویسندگان

Xiaoqing Zheng

Jiangtao Feng

Mengxiao Lin

Wenqiang Zhang

چکیده

Unsupervised word representations have demonstrated improvements in predictive generalization on various NLP tasks. Much effort has been devoted to effectively learning word embeddings, but little attention has been given to distributed character representations, although such character-level representations could be very useful for a variety of NLP applications in intrinsically “character-based” languages (e.g. Chinese and Japanese). On the other hand, most of existing models create a singleprototype representation per word, which is problematic because many words are in fact polysemous, and a single-prototype model is incapable of capturing phenomena of homonymy and polysemy. We present a neural network architecture to jointly learn character embeddings and induce context representations from large data sets. The explicitly produced context representations are further used to learn context-specific and multipleprototype character embeddings, particularly capturing their polysemous variants. Our character embeddings were evaluated on three NLP tasks of character similarity, word segmentation and named entity recognition, and the experimental results demonstrated the proposed method outperformed other competing ones on all the three tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Groups with Two Extreme Character Degrees and their Minimal Faithful Representations

for a finite group G, we denote by p(G) the minimal degree of faithful permutation representations of G, and denote by c(G), the minimal degree of faithful representation of G by quasi-permutation matrices over the complex field C. In this paper we will assume that, G is a p-group of exponent p and class 2, where p is prime and cd(G) = {1, |G : Z(G)|^1/2}. Then we will s...

متن کامل

THE RATIONAL CHARACTER TABLE OF SPECIAL LINEAR GROUPS

In this paper we will give the character table of the irreducible rational representations of G=SL (2, q) where q= , p prime, n>O, by using the character table and the Schur indices of SL(2,q).

متن کامل

Multi-prototype Chinese Character Embedding

Chinese sentences are written as sequences of characters, which are elementary units of syntax and semantics. Characters are highly polysemous in forming words. We present a position-sensitive skip-gram model to learn multi-prototype Chinese character embeddings, and explore the usefulness of such character embeddings to Chinese NLP tasks. Evaluation on character similarity shows that multi-pro...

متن کامل

Characters in Search of an Author:

In this paper, we present the first results obtained with an interactive storytelling prototype. Our main objective is to develop flexible character-based systems, which nevertheless rely on narrative formalisms and representations. Characters’ behaviours are generated from plan-based representations, whose content is derived from narrative formalisms. We suggest that search based planning can ...

متن کامل

Corpus-level Fine-grained Entity Typing

This paper addresses the problem of corpus-level entity typing, i.e., inferring from a large corpus that an entity is a member of a class such as “food” or “artist”. The application of entity typing we are interested in is knowledge base completion, specifically, to learn which classes an entity is a member of. We propose FIGMENT to tackle this problem. FIGMENT is embeddingbased and combines (i...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Context-Specific and Multi-Prototype Character Representations

نویسندگان

چکیده

منابع مشابه

Groups with Two Extreme Character Degrees and their Minimal Faithful Representations

THE RATIONAL CHARACTER TABLE OF SPECIAL LINEAR GROUPS

Multi-prototype Chinese Character Embedding

Characters in Search of an Author:

Corpus-level Fine-grained Entity Typing

عنوان ژورنال:

اشتراک گذاری